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A  stopping  rule,  given  date,  is  informative  relative  to  param¬ 
eters  of  interest  if  it  is  random  and  statistically  dependent 
on  those  parameters.  Practical  examples  considered  in  detail 
illuminate  the  role  of  informative  stopping  rules  and  show  how 
they  may  arise  in  practice.  The  discussion  is  based  on  the 
Bayesian  Approach. 


V.'*V;/.V/aV  >  VWJI  'j.  "J.  •- 


INFORMATIVE  STOPPING  RULES 
by 

Richard  E.  Barlow  and  S.  W.  W.  Shor 

Ralffa  and  Schlaifer  (1961)  discuss  noninformative  stopping  rules  In 
their  book.  However,  the  role  and  importance  of  informative  stopping  rules, 
especially  relative  to  censored  data,  is  not  made  clear.  The  following 
examples  and  discussion  clarify  the  role  of  informative  stopping  rules  in 
data  analysis. 

In  recording  or  extracting  information  for  statistical  analysis,  some 
rule  or  set  of  instructions  must  be  employed  either  explicitly  or  Implicitly 
in  order  to  terminate  the  recording  or  information  extraction  procedure. 

For  example,  records  on  fossil  fuel  electrical  power  plants  were  searched 
relative  to  the  frequency  and  duration  of  forced  outages  exceeding  60  days. 
Since  the  records  were  tabulated  by  quarter  of  a  year,  all  outages  exceed¬ 
ing  30  days  in  a  quarter  were  extracted  from  the  record.  If  an  outage  ex¬ 
ceeding  30  days  was  still  in  effect  at  the  end  of  a  quarter,  the  following 
quarter  was  searched  to  complete  the  record  for  that  particular  outage.  If 
an  outage  exceeded  30  days  from  the  start  of  a  quarter,  the  previous  quarter 
was  saarchad  to  complete  the  record  also  for  that  particular  outage.  By 
following  this  procedure,  we  could  be  sure  that  no  60  day  or  greater  outage 
was  missed.  Relative  to  60  day  or  greater  outages,  this  particular  stopping 
rule,  given  the  data,  was  noninformatlve  with  respect  to  model  parameters. 
All  of  our  Information  about  model  parameters  was  contained  in  the  number 
and  durations  of  60  day  or  greater  outages— none  of  which  were  missed. 

However,  it  subsequently  became  necessary  to  use  the  same  extracted 
data  to  assess  the  frequency  and  duration  of  30  day  or  greater  forced 
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outages.  Relative  to  these  outages,  our  search  procedure,  and  hence  our 
stopping  rule,  almost  surely  missed  some  30  day  or  greater  outages  in  the 
record.  See  Figure  1.  The  missed  outages  constitute  an  unobserved  nuisance 
parameter,  say  $  ,  whose  distribution  depends  on  both  the  stopping  rule  and 
the  unknown  model  parameters  of  interest.  Given  observed  30  day  or  greater 
outages  and  the  knowledge  that  some  could  have  been  missed,  our  stopping 
rule  was  now  informative  relative  to  unknown  model  parameters  defining  the 
probability  distribution  for  outage  durations  exceeding  30  days. 


End  of 
Quarter 


Outage  Outage 

Begins  Ends 


FIGURE  1 


EXAMPLE  OF  A  MISSED  OUTAGE 
EXCEEDING  30  DATS 


"J 
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1.  DEFINITIONS 

Suppose  a  unit  lifetime  (or  downtime)  duration,  X  ,  depends  on  un¬ 
known  parameters,  0. »  . ..,  0g)  .  Observation  on  a  unit  may  stop 

before  a  unit  lifetime  (or  downtime)  duration  is  observed.  Let  STOP  be 
a  rule  or  a  set  of  instructions  which  determines  when  observation  of  a 
unit  stops.  STOP  may  be  random  and  dependent  on  unknown  parameters.  The 
stopping  rule  is  not  necessarily  the  same  as  the  "stopping  time." 

The  stopping  rule  discussed  in  the  Introduction  was:  "Extract  a 
downtime  duration  from  a  quarterly  record  if  it  exceeds  30  days,  otherwise 
Ignore  it."  Consequently,  relative  to  inference  about  30  day  downtime 
duration  probability  parameters,  this  stopping  rule  is  random ,  since 
observation  of  any  particular  unit  downtime  is  random. 

Definition: 

A  stopping  rule,  STOP  ,  is  noninformative  relative  to  model  parameters 

JB  ■  (0^,02 . 0g)  if  STOP  is  statistically  independent  of  j)  ,  given 

data;  i.e., 

STOP  J.  0.  |  Data  . 

Another  way  of  saying  this  is  that  the  posterior  density  for  0^  , 
given  the  data,  is  the  same  as  the  posterior  density  for  j9  given  the  data 
and  the  stopping  rule;  i.e., 

ir(j0  |  Data)  ■  |  Data,  STOP) 

for  all  J0  . 

If  the  stopping  rule  is  not  random  relative  to  the  data,  then  it  is 
Independent  of  ,0  .  In  our  example,  the  stopping  rule  was  not  random 


relative  to  60  day  downtime  durations ,  since  none  were  missed.  Consequently, 
the  stopping  rule  was  noninforma tive  in  this  case. 

Formulas  for  calculating  the  likelihood  have  been  developed  for  general 
sampling  plans  in  which  the  stopping  rule  is  noninformative  given  data  [cf. 
Barlow  and  Proschan  (1980),  Theorem  1.7].  Most  of  the  stopping  rule  examples 
in  the  statistical  literature  concern  noninformative  stopping  rules.  An  ex¬ 
ception  is  a  paper  by  Roberts  (1967)  which  presents  an  example  based  on  fish 
capture-recapture  sampling  methods.  However,  he  comments  that  "interest  in 
exploiting  the  Information  in  the  stopping  rule  is  likely  to  be  great  only 
for  very  small  sample  sizes."  Although  his  statement  applies  to  his  examples, 
it  is  not  true  for  the  example  we  now  discuss  in  detail. 


2.  ANALYSIS  OF  AN  INFORMATIVE  STOPPING  RULE 

Consider  the  stopping  rule,  STOP  ,  discussed  in  the  introduction  and 
the  list  in  Table  2.1  of  outages  30  days  or  greater  extracted  from  quar¬ 
terly  records.  There  were  k  ■  72  such  outages  found.  However,  due  to 
the  stopping  rule,  some  outages  of  this  type  were  almost  surely  missed. 

Since  we  are  interested  in  the  conditional  probability  distribution 
of  the  excess  over  30  days  of  such  outages,  we  subtracted  720  ■  (24) (30) 
hours  from  the  listed  duration  hours  in  Table  2.1.  Let  y^.y^,  ...»  yfc 
denote  these  excess  downtimes.  A  transform  of  this  data  was  then  plotted 
in  Figure  2.1.  Were  the  conditional  distribution  of  excess  durations  ex¬ 
ponential,  we  would  expect  the  plot  to  lie  close  to  the  45  degree  line 
and,  in  fact,  cross  it.  Since  our  plot  exhibits  this  type  of  behavior  and 
since  also  the  sample  coefficient  of  variation  is  close  to  1,  we  adopt  an 
exponential  model 

f(x  |  6)  -  (l/8)e-x/6 

for  our  conditional  probability  distribution  of  excess  durations.  The 
rational  for  this  procedure  is  based  on 

(a)  the  relatively  large  sample  size,  k  ■  72  ; 

(b)  the  total  time  on  test  plot  is  the  maximum  likelihood 
estimate  of  a  transform  of  the  sample  distribution. 

Obviously,  the  exponential  model  is  mathematically  convenient  and  can  be 
justified  if  it  provides  a  close  approximation  measure  for  our  uncertainty 
about  conditional  durations,  given  the  data  k  and  y^^,  •••»  • 
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TABLE  2.1 

FOSSIL  UNITS  575  MW  AND  LARGER 


DOWNTIME 

DURATION 

DATE 

UNIT 

HOURS 

Quarter  1  1976 


2/18/76 

Amos  Unit  1 

1412 

1/30/76 

H.  L.  Bowen  Unit  1 

1018 

2/22/76 

Kincaid  No.  2 

1660 

2/07/76 

Ninemile  Point  No.  4 

1294 

Quarter  2  1976 

4/01/76 

H.  L.  Bowen  Unit  1 

4390 

4/01/76 

Cardinal  Unit  2 

792 

5/17/76 

Monroe  No.  1 

781 

4/20/76 

W.  H.  Sammls  No.  6 

733 

Quarter  3  1976 

7/20/76 

Bowline  Point  Unit  1 

1469 

8/08/76 

Kincaid  No.  2 

1125 

Quarter  4  1976 

10/11/76 

H.  L.  Bowen  Unit  2 

797 

12/20/76 

Ninemile  Point  No.  5 

2755 

Quarter  1  1977 

3/07/77 

Amos  Unit  2 

2925 

1/18/77 

Chalk  Point  Unit  3 

806 

3/21/77 

Cliffslde  Unit  5 

996 

2/28/77 

Gorgas  Unit  10 

720 

2/05/77 

Mohave  Unit  2 

1161 

2/14/77 

Ninemile  Point  No.  4 

2548 

1/03/77 

W.  H.  Salami s  No.  6 

4432 

Quarter  2  1977 

4/30/77 

Astoria  Project 

3913 

4/08/77 

Baxter  Wilson  Unit  2 

940 

6/24/77 

H.  L.  Bowen  Unit  1 

1053 

5/27/77 

Bowline  Point  Unit  2 

1631 

4/05/77 

Oswego  Unit  5 

1035 

Quarter  3  1977 

8/10/77 

Belews  Creek  Unit  1 

773 

8/08/77 

Chalk  Point  Unit  3 

915 

9/30/77 

Chalk  Point  Unit  3 

846 

7/07/77 

Sherburne  Unit  1 

1521 

Quarter  4  1977 

11/06/77 

tee s  Unit  2 

850 

11/09/77 

Baldwin  Unit  2 

792 

11/30/77 

Cumberland  Unit  1 

766 
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DOWNTIME 

DURATION 

DATE  UNIT  HOURS 


11/23/77 

Kincaid  No.  1 

1928 

11/15/77 

La  Cygne  Unit  1 

961 

10/04/77 

W.  H.  Sammls  No.  7 

1257 

Quarter  1 
2/24/78 

1978 

Cumberland  Unit  1 

851 

1/30/78 

Harrison  Unit  2 

3625 

2/01/78 

Mohave  Unit  1 

3528 

3/10/78 

Ninemile  Point  No.  5 

2216 

1/09/78 

W.  H.  Sammis  No.  7 

3109 

Quarter  2 
5/05/78 

1978 

Gaston  Steam  Plant  Unit  5 

864 

5/15/78 

Marshall  No.  3 

1408 

5/19/78 

Ninemile  Point  No.  4 

2958 

5/03/78 

Tradinghouse  Creek  Unit  2 

2188 

Quarter  3 
7/01/78 

1978 

Centralia  Unit  1 

1559 

7/16/78 

Ninemile  Point  No.  4 

3557 

9/29/78 

Ormond  Beach  Unit  2 

776 

Quarter  4 
11/06/78 

1978 

Keystone  No.  1 

768 

10/03/78 

Oswego  Unit  5 

1247 

Quarter  1 
2/17/79 

1979 

Conesville  Unit  4 

2411 

1/01/79 

Hatfield  No.  1 

4167 

1/01/79 

Hatfield  No.  1 

3320 

1/01/79 

Mt.  Storm  No.  1 

2159 

2/03/79 

Paradise  No.  1 

3424 

3/30/79 

Ravenswood  No.  3 

2903 

Quarter  2 
4/01/79 

1979 

Baxter  Wilson  Unit  2 

1672 

6/28/79 

Harrison  Unit  2 

3858 

4/01/79 

La  Cygne  Unit  2 

1287 

5/27/79 

Mohave  Unit  1 

792 

Quarter  3 
7/26/79 

1979 

Astoria  Project 

1034 

8/06/79 

Harrison  Unit  1 

5116 

8/30/79 

Hudson  No.  2 

761 

8/20/79 

Keystone  No.  1 

1003 

7/08/79 

La  Cygne  Unit  1 

920 

7/23/79 

Ormond  Beach  Unit  2 

799 

9/22/79 

Pittsburg  Unit  7 

1573 

9/29/79 

W.  F.  Wyman  Unit  4 

1815 
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DATE 

UNIT 

DOWNTIME 

DURATION 

HOURS 

Quarter  4 
10/01/79 

1979 

Centralia  Unit  2 

1278 

11/22/79 

Mt.  Storm  Unit  3 

2586 

Quarter  1 
1/12/80 

1980 

Hatfield  No.  1 

1287 

Quarter  2 
5/05/80 

1980 

Belews  Creek  Unit  2 

1360 

5/14/80 

Kincaid  No .  1 

903 

5/04/80 

W.  H.  Sammis  No.  7 

1286 
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Relative  to  our  data  base,  outages  greater  than  30  days  are  fairly 
rare.  Hence,  given  t  *  492.5  unit  years  operating  experience,  our  a 
priori  probability  for  observing  k  such  outages  is 


P[N(t)  -  k  |  t,A]  =  (At)ke'Xt/k!  , 


where  X  is  the  expected  number  of  such  outages  per  unit  year.  He 
suppose  that  any  exchangeable  collection  of  additional  units  will  have 
this  same  unknown  rate  X  . 
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3,  LIKELIHOOD  DERIVATION 

Given  t ,  STOP  ,  and  the  data  In  Table  2.1,  we  need  to  calculate 
the  likelihood  for  6  and  A  .  Let  £  be  the  calendar  period  from 
January  1,  1976  to  the  end  of  the  second  quarter  1980  less  30  days,  since 
we  would  not  have  caught  such  outages  beginning  within  30  days  of  the  end 
of  the  second  quarter  1980.  Our  a  priori  expectation  for  the  number  of 
such  outages  occurring  in  £  is  At  ,  given  A  .  The  conditional  prob¬ 
ability  that  such  an  outage,  having  occurred  in  £  ,  will  not  be  missed  is 

[i  -  *  P(e,] 

where  the  probability  that  such  an  outage  occurs  in  a  A  -  30 

day  Interval  preceding  the  end  of  a  quarter  and  m  ■  17  is  the  number  of 
critical  intervals.  This  probability  is  multiplied  by  p(0)  ,  the  condi¬ 
tional  probability  chat  such  an  outage  falling  within  a  critical  Interval 
will  actually  be  missed.  Hence,  our  prior  expectation  for  the  number  of 
observed  outages  in  t  unit  years  operating  experience  is 

At[i  -  24  p(e)J  . 

We  now  derive  the  formula  for  p(6)  .  Suppose  an  outage  of  length 
Z  starts  in  a  critical  A  •  30  day  Interval  and  at  time  x  ,  u  time  units 
from  the  end  of  a  quarter.  See  Figure  3.1.  Let  Z  ■  A  +  Y  where  Y  is 
the  excess  over  30  days.  Then 

p(8)  "^|*p{Z£u  +  A  |  Z>A  ,  outage  starts  in  A  interval} 


Critical  Interval 

A  t 


7100*1  3.1 

DIAC1AM  ILLUtTIATiaC  A  MISS  INC  OUTABS 
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and 

p(0)  -  1  -  f  (1  -  e“A/0)  . 


The  observed  number  of  outages  is  also  Poisson  with  parameter 
4  -  T  *<•>]  •  This  can  be  shown  from  first  principles  by  successively 
conditioning  and  unconditioning  starting  with  the  formula 

P[N(t)  -  k  |  X,6,t,ST0P] 

’  1  L  L  OT1  (*  -  fP(n  — ""I  ^  • 

By  straightforward  algebra,  we  obtain 

P[N(t)  -  k  1  X,9, t,STOP] 

kr  «A  lk  “Xt[1  p(6)l 

<At)k[l  -  ^  p(0)Jke  L  1  J 

“  ki  • 

The  likelihood  for  X  and  6  ,  given  k  observed  such  outages 
with  excess  durations  y^.yj ,  ...»  yfc  and  stopping  rule  STOP,  is 

L(X,6  |  k,y^,  > • ■ (  y^,t,STOP] 

-  xk[l  -  SA  p<„]k.*4  P<9)Wt/9 

since  conditional  on  k  outages  in  1  , 

L(e  |  k,T)  -<  e'VT/e 

k 

where  T  ■  J  y  .  Note  that  conditional  on  k  ,  (k,T)  is  sufficient  for 
i-1  1 

9  under  the  exponential  model. 
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Here  the  stopping  rule  STOP  given  data  noninformat ive,  the  likeli¬ 
hood  would  have  been 


aV'W1'8 


In  this  case,  we  would  have  estimated  A  and  6  by  A  ■  k/t  and 
6*  -  T/k  .  However,  a  closer  approximation  to  the  MLE's  can  be  found  by 

A 

using  6  -  T/k  and  calculating  the  value  A  for  which 


A 


k 


is  maximum.  This  approximate  MLE,  A  ,  is 


A 

A 


[See  DeGroot  (1970,  p.  199)  for  Bayesian  justification  for  MLE.] 


numerical  Example; 

Por  the  data  in  Table  2.1,  k  »  72  ,  T  -  7.5768  x  10*  hours  and 
t  -  492.5  unit  years  so  that  A*  «  0.146  per  unit  year.  On  the  other 
hand,  6  ■  T/k  and  p(6)  ■  0.276  so  that 

A  -  72/492. 5[l  -  ^1221  (0.276)] 


A  -  72/(492. 5) (0.911)  -  0.163 


This  is  a  9%  Increase  over  A  ! 
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4.  STOPPING  RULES  USED  IS  LIFE  TABLE  ANALYSIS 

Breslow  and  Crowley  (1974)  and  Lindley  (1979)  studied  the  following 
model  relative  to  estimating  death  rates  for  specified  age  intervals. 
Associated  with  each  individual  is  a  pair  of  independent  random  quantities, 

X  ,  the  lifetime  and  Y  ,  the  withdrawal  time.  The  raw  observations  for 
an  individual  are  Z  ■  Min  (X,Y)  ,  the  time  at  which  he  leaves  either 
through  death  or  withdrawal,  and  an  indicator  which  says  whether  the  de¬ 
parture  was  caused  by  death  or  withdrawal.  The  time  scale  is  then  divided 
into  nonoverlapping  intervals  and  the  Z's  grouped  so  that  observation  is 
only  made  on  the  interval  within  which  he  left  the  system.  The  quantities 
for  different  individuals  are  Judged  independent  and  Identically  distributed. 
Consequently,  if  N  Individuals  are  present  at  the  beginning  of  an  Interval, 
the  data  consists  of  D  ,  the  number  who  were  observed  to  die  in  the  inter¬ 
val;  W  the  number  observed  to  withdraw  alive  during  the  interval;  and  S 
the  number  who  survived  to  enter  the  next  Interval.  Ve  consider  only  the 
single  interval  [0,4]  .  Let  X  have  distribution  F  and  let  ♦  -  F(A) 
be  the  random  quantity  of  interest.  Let  Y  have  distribution  H  ,  0  -  H(A) 
and 

p  F(x)dH(x)/0$ 

so  that  p  is  the  conditional  probability  that  a  death  is  observed,  given 
that  both  withdrawal  and  death  take  place  in  the  interval.  As  Lindley 
(1979)  points  out,  the  likelihood  for  $  ,  6  and  p  can  be  calculated 
given  D  ,  S  and  W  .  Obviously,  6  and  p  are  nuisance  parameters. 

Since  observation  on  a  unit  ceases  at  Min  (X,Y,A)  and  Y  is  random, 
the  stopping  rule  is  random.  Is  the  stopping  rule  informative? 


Given  the  stopping  rule,  W  provides  partial  information  about  $  . 

*  *  * 

Note  that  W  ■  D  +  S  where  D  is  the  number  of  withdrawals  that  would 

have  been  observed  to  die  in  [0,A]  had  they  not  been  withdrawn  and 
* 

similarly  for  S  .  The  probability  that  an  individual  will  die  in  [0,A] 
and  his  death  will  not  be  observed  is  8+[l  -  p]  ,  which  clearly  depends 
on  ♦  •  The  likelihood  as  calculated  by  Lindley  (1979)  is 

L(*,0,p  |  D,S,W) 

-  *D(i  -  p*)wu  -  ♦)SU  -  a  -  p)e}Dew(i  -  8)S  . 

Since  8  and  p  are  nuisance  parameters,  they  must  be  integrated  out 
with  respect  to  a  joint  prior  for  (4,8,p)  . 

Were  we  given  the  ages  of  death  (x^,  ...,  x^)  ,  the  survival  and 
withdrawal  ages  (l. ,  . . . ,  l  )  end  were  the  parameter  of  Interest  the 

1  n 

force  of  mortality  [r(u),u  _>  0)  ,  then  the  likelihood  would  be 


I*(r(u),u  ^  0  |  x^,  •  ■  ■ ,  xfc  ,  •••» 


where  n(u)  is  the  number  of  individuals  observed  surviving  to  age  u  . 
In  this  case,  the  stopping  rule  would  be  noninformat ive,  since  the  pos¬ 
terior  for  [r(u),u  _>  0]  would  not  depend  on  the  stopping  rule. 


wvmmu.  b?  ji  in  M  wrm  j  »  jiwjiji,  pi^hm  h  a>.a  #.  ».■  *.* 
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5.  CONCLUSION 

In  analyzing  data,  it  is  important  to  think  about  the  way  in  which  the 
data  was  obtained.  If  the  stopping  rule,  given  data,  is  informative  in  the 
sense  defined  in  Section  1,  the  likelihood  calculation  may  be  more  difficult, 
but  resulting  estimates  may  differ  significantly  from  those  which  ignore  the 
stopping  rule.  Whether  or  not  the  information  contained  in  the  stopping  rule 
is  relevant  depends  on  the  observed  data  as  well  as  the  model,  the  parameter 
of  Interest  and  its  prior  probability  as  the  previous  examples  demonstrate. 


■  v  svv 
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